Investigating the Opacity of Verb-Noun Multiword Expression Usages in Context
نویسندگان
چکیده
This study investigates the supervised token-based identification of Multiword Expressions (MWEs). This is an ongoing research to exploit the information contained in the contexts in which different instances of an expression could occur. This information is used to investigate the question of whether an expression is literal or MWE. Lexical and syntactic context features derived from vector representations are shown to be more effective over traditional statistical measures to identify tokens of MWEs.
منابع مشابه
A Word Embedding Approach to Identifying Verb-Noun Idiomatic Combinations
Verb–noun idiomatic combinations (VNICs) are idioms consisting of a verb with a noun in its direct object position. Usages of these expressions can be ambiguous between an idiomatic usage and a literal combination. In this paper we propose supervised and unsupervised approaches, based on word embeddings, to identifying token instances of VNICs. Our proposed supervised and unsupervised approache...
متن کاملThe VNC-Tokens Dataset
Idiomatic expressions formed from a verb and a noun in its direct object position are a productive cross-lingual class of multiword expressions, which can be used both idiomatically and as a literal combination. This paper presents the VNC-Tokens dataset, a resource of almost 3000 English verb–noun combination usages annotated as to whether they are literal or idiomatic. Previous research using...
متن کاملUnsupervised Classification of Verb Noun Multi-Word Expression Tokens
We address the problem of classifying multiword expression tokens in running text. We focus our study on Verb-Noun Constructions (VNC) that vary in their idiomaticity depending on context. VNC tokens are classified as either idiomatic or literal. Our approach hinges upon the assumption that a literal VNC will have more in common with its component words than an idiomatic one. Commonality is mea...
متن کاملUsing Distributional Similarity of Multi-way Translations to Predict Multiword Expression Compositionality
We predict the compositionality of multiword expressions using distributional similarity between each component word and the overall expression, based on translations into multiple languages. We evaluate the method over English noun compounds, English verb particle constructions and German noun compounds. We show that the estimation of compositionality is improved when using translations into m...
متن کاملTreatment of Multiword Expressions and Compounds in Bulgarian
The paper shows that catena representation together with valence information can provide a good way of encoding Multiword Expressions (beyond idioms). It also discusses a strategy for mapping noun/verb compounds with their counterpart syntactic phrases. The data on Multiword Expression comes from BulTreeBank, while the data on compounds comes from a morphological dictionary of Bulgarian.
متن کامل